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ON THE LOCAL CONVERGENCE OF PATTERN SEARCH 


ELIZABETH D. DOLAN*, ROBERT MICHAEL LEWIS*, AND VIRGINIA TORCZON§ 


Abstract. We examine the local convergence properties of pattern search methods, complementing the 
previously established global convergence properties for this class of algorithms. We show that the step- length 
control parameter which appears in the definition of pattern search algorithms provides a reliable asymptotic 
measure of first-order stationarity. This gives an analytical justification for a traditional stopping criterion 
for pattern search methods. Using this measure of first-order stationarity, we analyze the behavior of pattern 
search in the neighborhood of an isolated local minimizer. We show that a recognizable subsequence converges 
r-linearly to the minimizer. 

Key words, pattern search, local convergence analysis, global convergence analysis, desultory rate of 
convergence 

Subject classification. Applied and numerical mathematics 

1. Introduction. Pattern search methods are a class of direct search methods for solving nonlinear 
optimization problems. The global convergence properties of pattern search for both constrained and un- 
constrained problems have been established in a series of papers [7-11]. In this paper, we consider the local 
convergence properties of pattern search and revisit the global convergence properties in light of these new 
results. 

For simplicity, our discussion will focus on the case of unconstrained minimization: 

min f(x). 

i£R" v 

Results similar to those we present here can also be derived for the general case of bound and linear constraints 
[9, 10]. However, the underlying ideas are simpler to explain for the unconstrained case. 

We first show how the pattern size parameter, which plays a central role in the definition of pattern 
search methods and tacitly serves as a step-length control mechanism, also provides a reliable asymptotic 
measure of first-order stationarity. This gives an analytical justification for the traditional use of the pattern 
size parameter as a stopping criterion. We also establish a local convergence result concerning the behavior 
of the sequence of iterates produced by a pattern search algorithm in the neighborhood of an isolated local 
minimizer :r*. We show that under reasonable hypotheses the sequence of iterates converges to £* and, 
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moreover, an identifiable subsequence of the iterates converges r-linearly to a:*. These analytical results are 
illustrated with some simple numerical experiments on quadratic objectives. 

What is interesting about the analysis presented here is that we can establish local convergence properties 
despite the fact that direct search methods do not employ an explicit representation of the gradient of the 
objective and, as a consequence, cannot enforce a notion of sufficient decrease. We proved global convergence 
results for pattern search by showing that all iterates lie on a rational lattice. It is this restriction on the form 
of the steps that allows us to relax the notion of sufficient decrease and yet still prove global convergence. 
Pattern search may accept any point on the current lattice so long as it produces simple decrease on the value 
of the function at the current iterate. However, key to the global analysis is the notion of having searched 
in a sufficient number of directions from the current iterate to guarantee that we have not overlooked a 
potential direction of descent. It is only after searching over a sufficient set of directions that we are allowed 
to reduce the current step-length control parameter — which has the effect of refining the lattice over which 
we are searching. 

This notion of sufficient local information at iterations at which we reduce the pattern size allows us to 
show that the pattern size, as measured by the step-length control parameter, provides a reliable asymptotic 
measure of first-order stationarity. This analytical result is gratifying since it vindicates the long-standing 
use of the step-length control parameter as a stopping criterion for direct search methods (see, for instance, 
Section 4 of [6]). The result on the correlation of the pattern size parameter and stationarity then enables 
us to study the local convergence properties. 

Notation. We use L(x o) to denote the set {.r | f(x) < /(xo)}. It is assumed, unless otherwise noted, 
that all norms are Euclidean vector norms or the associated operator norm. We will use d to denote the 
boundary of a given set. Given x and r > 0, we denote by B(x,r) the open ball of radius r centered at x: 
B(x,r ) = { y | || y - x || < r }. 

2. Pattern search. We first review the elements of pattern search that play a role in our local analysis. 
There are rigorous formal definitions of pattern search [7, 11], several features of which we will shortly recall. 
However, pattern search can perhaps be most quickly understood with the following simple example of a 
pattern search algorithm. At iteration k, we have an iterate Xk E R” and a step-length control parameter 
Ak > 0. Let ej, j = be the standard unit basis vectors. For the purposes of this example, we 

represent the pattern of points over which we will search as the set D = }f=i = {ei, . . . , e n , — ei, . . . , — e n } 
though, as we discuss shortly, many other choices are possible. We now have several algorithmic options 
open to us. We will consider the simple opportunistic strategy, which is to look successively at the points 
a"-)- = Xk + A kdi, i E (1, ■ • • , 2 n} until we either find an for which /(x + ) < f(xk) or we exhaust all 2 n 

possibilities. Fig. 2.1 illustrates the pattern of points among which we search for a~ + when n = 2. 

O 


o 


Afc 


• o 


o 


Fig. 2.1. A simple instance of pattern search 
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If we find no x + such that f(x+) < f(xk), then we call the iteration unsuccessful ; otherwise, we 
consider the iteration successful since we have found a new iterate that produces decrease on / at Xk- 
When the iteration is unsuccessful, we set Xfc+i = Xk and are required to reduce (typically, by a half) 
before continuing; otherwise, for a successful iteration, we set Xk+i = x + and leave the step-length control 
parameter alone, i.e., = A& (though the analysis also allows us to increase A* if the iteration is a 

success). We repeat this process until some suitable stopping criterion is satisfied. 

Note that overall our requirements on the outcome of the search at each iteration are light: if after 
searching over all the points defined by A kdi, i = 1, . . . , 2 n we fail to find a point x + = Xk + A kdi that 
reduces the value of / at xj., then we must try again with a smaller value of A*. Otherwise, we accept as 
our new iterate the first point in the pattern that produces decrease. In the latter case, we may choose to 
modify A*,. In either case, we are free to make changes to the pattern D to be used in the next iteration, 
though we leave the pattern unchanged in the example given above. However, in general, changes to either 
the step-length control parameter or the pattern are subject to certain algebraic conditions, outlined fully 
in [7], 

A distinguishing characteristic of pattern search methods is that they sample the function over a prede- 
fined pattern of points, all of which he on a rational lattice. By enforcing structure on the form of the points 
in the pattern, as well as some simple rules on both the outcome of the search and the subsequent updates, 
standard global convergence results can be obtained [7, 11], 

There remains the question of what constitutes an acceptable pattern. A pattern must be a positive 
spanning set for R ra [4, 7]. A set of vectors (ai, . . . , a p } positively spans R n if any vector x £ R™ can be 
written as a nonnegative linear combination of the vectors in the set; i.e., 

X = OL\Q,\ T * * * T GcpQ,p Cti ^ 0 Vi. 

The set {ai, . . . , a r } is called positively dependent if one of the af s is a nonnegative combination of the 
others; otherwise the set is positively independent. A positive basis is a positively independent set whose 
positive span is R". 

It is straightforward to verify that the set of vectors {ei, e 2 , — ei, — e 2 } we used to define the pattern for 
our simple example above is a positive spanning set, as is {ei, . . . , e n , — ei, . . . , —e n } for other values of n. 

2.1. Prior results. Before proceeding to our local convergence results, we recall the following propo- 
sition from [7], which we will state here without proof. 

Proposition 2.1. Given any set (a-i, . . . , a r } that positively spans R”, ^ 0 for i = 1, . . . , r, there 

exists C 2.1 > 0 such that for all x £ R", we can find an a i for which 

x T ai > c 2 .i ||a:|| 1 1 a.j 1 1 . 

Note that this is a purely geometric property of positive spanning sets. 

2.2. Some formal definitions. We also need to recall some notation regarding both the pattern and 
the form of the search. For the details, we refer the reader to [7, 11]. 

We have already noted that the pattern must be a positive spanning set for R”. In fact, we represent 
the pattern using two components, a basis matrix and a generating matrix. 

The basis matrix can be any nonsingular matrix B £ R nXTl . 

The generating matrix is an integral matrix Ck £ Z rtxpfc , where pk > n + 1. We require C*, to contain a 
minimum of n + 2 columns because the minimum number of vectors in a positive spanning set is n + 1 [4] ; 


3 



for convenience, we require a column of zeros to denote the zero step. We further partition the generating 
matrix so that the positive basis that guarantees that the pattern positively spans R n is revealed. We call the 
columns associated with the positive basis the core pattern, which we denote as T*,; any remaining columns 
in the positive spanning set are denoted using L k : 

( 2 . 1 ) c k = [ r k L k 0 ]. 

We further require that T*, £ T, where T comprises a finite set of integral matrices, each of which is a positive 
basis for R™ . 

A pattern P k is then represented by the columns of the matrix P k = BC k . For convenience, we use the 
partition of the generating matrix C k given in (2.1) to partition P k as follows: 

P k = BG k = [ BY k BL k 0 ]. 

To tie this notation back to the example that introduces Section 2, we note that B = I, T k = [I — I] 
and L k = [0] (where 0 denotes the zero vector). Since the choice of and L k is fixed for all k, P k = [D 0] 
for all k. 

Now, given the step-length control parameter A*, £ R, A k > 0, we define a trial step s\ to be any vector 
of the form s\ = A k BP k , where c\. is a column of C k . 

In Fig. 2.2 we state the general form of a pattern search method for unconstrained minimization. 

Let xo € R” and Ao > 0 be given. 

For k = 0, 1, . . until convergence do: 

1. Compute f(x k ). 

2. Determine a step s k using an unconstrained exploratory moves algorithm. 

3. If f(x k + s k ) < f(x k ), then x k+ i = x k + s k . Otherwise x k+1 = x k . 

4. Update C k and A k . 

FlG. 2.2. Generalized, pattern search for unconstrained minimization 

We have remarkable latitude in the way in which we choose the step s k . For the global convergence 
analysis to hold, we need only satisfy the hypotheses on the outcome of the unconstrained exploratory moves, 
given in Fig. 2.3. 

1. s k £ A k P k . 

2. If min {f(x k + y) \ y £ A k BT k } < f{x k ), then f(x k + s k ) < f{x k ). 

FlG. 2.3. Hypotheses on the outcome of the unconstrained exploratory moves 


A few comments on these hypotheses are in order. The first hypothesis is straightforward: the step 
returned must be defined by the current pattern P k , scaled by the current value of the step-length control 
parameter A k . This is the condition that ensures that the steps we consider remain on the rational lattice; 
arbitrary steps are not allowed. 

For our purposes, the second hypothesis is the more interesting. Notice that in Fig. 2.2, all that is required 
for a successful iteration of pattern search is that the step s k produce simple decrease, i.e., f(x k -\-s k ) < f(x k ). 
Thus, any nonzero step defined by a column of A k P k that satisfies the condition f(x k + s k ) < f(x k ) may be 
returned by the exploratory moves since it immediately satisfies both of the hypotheses given in Fig. 2.3 — 
even if we do not explicitly verify that min {f(x k +y) \ y £ A k BF k } < f(x k ) is true. Thus, as we suggested 
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for the pattern search algorithm described in Section 2, as soon as we find a point in the pattern that satisfies 
this simple decrease criterion, we may terminate the iteration and declare it a success. 

Recall, however, that Pk also contains a column of all zeros, so the question is: when is the zero step 
acceptable? The point of the second hypothesis in Fig. 2.3 is to ensure that we have sufficient information 
about the local behavior of / to declare an iteration unsuccessful, accept the zero step Sk = 0 (so that 
Xk^i = Xk), and reduce A*, to continue the search with smaller steps at the next iteration. The second 
hypothesis implicitly decrees that we may only return the zero step when we have looked at all the steps 
defined by the core pattern, i.e., all steps of the form y G AkBTk- If none of the steps in the core pattern 
produce decrease on / at xj., then we are free to accept the zero step. But if any step in the core pattern 
produces descent, the exploratory moves must return a step that produces descent — though as we have 
already seen, that step need not be defined by AkBTk so long as it is defined by A kBCk- 

For the purposes of the local convergence analysis that follows, it is the subsequence of unsuccessful 
iterates that interests us. To accept the zero step, the search must have considered all the points defined by 
the core pattern, which is itself a positive basis. This is where the geometric property of positive spanning 
sets captured in Proposition 2.1 comes in. Even though we do not have an explicit representation of V f(xk) 
(assuming that / is differentiable), Proposition 2.1 gives us a positive lower bound, which is independent of k, 
on the angle between V f{xk) (assuming it is nonzero) and some a* in the positive spanning set — even though 
at any given iteration we do not know for which a* this lower bound holds. However, this guaranteed lower 
bound, when combined with the second hypothesis in Fig. 2.3, ensures that at the end of an unsuccessful 
iteration, we have sufficient information about the local behavior of / at Xfc. Furthermore, the quality of our 
local information improves as we reduce A*, . 

Thus we have enough structure to construct local convergence results. The subsequence of unsuccessful 
iterates is well-defined: they are the iterations at which we can —we must reduce A*, to ensure that the 
search can make further progress. But we reduce A*, only after we have sufficient local information about 
the behavior of / to justify this action: we have considered all the steps defined by the columns of A^Ffc 
and none of them have produced descent on / at ifc. This is the fact we now exploit. 

3. Measuring first-order stationarity. The following theorem shows that the step-length control 
parameter A^, when small enough, provides a reasonable measure of first-order stationarity when reduced 
after an unsuccessful iteration. For simplicity, we assume that V/(x) is Lipschitz continuous; however, for 
the reader interested in greater generality we note that a similar result can be proven under the assumption 
of uniform continuity. 

THEOREM 3.1. Suppose V/(x) is Lipschitz continuous on an open neighborhood £2 of L(x o) with Lips- 
chitz constant C. Then there exist 5 -n > 0 and C3.1 > 0 for which the following holds. If Xk is an iterate at 
which there is an unsuccessful iteration and A k < <i 3 .i 1 then 

II V/(x fe ) || < c 3 . : A fe . 


Proof. Let r = \ min{l, dist(clL(xo), <9£2)}. If x G L(x 0 ), then the ball B(x, r) is contained in £2. We are 
interested in steps of the form s = A kBc k , where c k is a column of the core matrix r^. Since T*, G T and T 
is finite, ||s|| < A*, ||T?||F*, where T* is the maximum norm of any column of the matrices in the set T. Set 
« 3 .! = r/(||I?||r*). 

By the definition of pattern search, for any T^ G I\ the set (s | s G AkBTk} forms a positive basis for 
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R". Thus Proposition 2.1 assures us of the existence of a step s for which 


(3.1) -Vf(x k ) T s > c 2 . 1 ||V/(x fe )|| ||s||. 

Since iteration k is unsuccessful, it follows that 

f(x k +s) - f(x k ) > 0 Vs£A k BT k . 

Since A*, < < 53 . 1 , (x k + s) £ B(x k ,r) C and we can apply the mean value theorem. In addition, using (3.1) 
and the Cauchy-Schwarz inequality, for some £ in the line segment connecting x k and i^+swe have 

0 < f(x k +s) - f(x k ) 

= Vf(x k ) T s + (V/(C) - V/(x fc )) T 0 
< -c 2 .i||V/(z fc )|| ||s|| + ||V/(C) - V/(xfc)|| \\s\\, 

where s is the step for which (3.1) holds. Thus 

C 2 .i||v/(**)ll <||V/(0-V/(*fc)||. 

Again, since B(x k , r) C STi, the Lipschitz continuity of V f(x) gives us 

C 2 .i||V/(i fc )|| < Cllc-Ifcll < C||s|| < CA fc ||B||r*. 

Therefore 


||V/(i fc )[| < c 3 ,i Afc, 

with C3.1 = C||B||r*/c 2 .i. □ 

Theorem 3.1 gives a theoretical justification for a traditional stopping criterion for pattern search meth- 
ods. In the long literature on direct search methods, one frequently encounters the suggestion that a direct 
search method be terminated when some measure of the step size first falls below a value deemed suitably 
small [2, 3, 6], In the case of pattern search, Theorem 3.1 vindicates this intuition. At unsuccessful iterations, 
the step size in pattern search (as measured by A k ) provides a bound on first-order stationarity. At the 
same time, it is at the unsuccessful iterations that A k is decreased. Thus, decrease in A*, provides a natural 
measure of progress which can reliably be used to test for convergence. We discuss further the use of A*, to 
measure progress when we present some numerical examples in Section 5. 

A similar relation between A^ and constrained stationarity in the case of pattern search for bound 
constrained problems is explicitly used in the pattern search augmented Lagrangian algorithm in [8], The 
result plays a critical role in allowing successive inexact minimization of an augmented Lagrangian without 
an explicit estimate of the gradient. A relation similar to Theorem 3.1 for linearly constrained pattern search 
appears in [10]. 

The global convergence analysis of pattern search in [11] says that if L(x 0 ) is compact then we have 
liminffc_ >00 || A k || = 0 and liminffc_>oo |[ V/(xfc) || = 0. The former result and Theorem 3.1 allow us 
to sharpen the latter result. Let the set S represent a subsequence of unsuccessful iterations for which 
limfc—jco^gs A k = 0 (such a subsequence exists since lim inffc—xjo A^ = 0). Then Theorem 3.1 says that we 
have limfc^oo^gs || V/(x fc ) || = 0. 

The general result liminffc^oo [| V/(xfc) || = 0 for pattern search leaves open the possibility that 
|] V/(xfe) || does not converge. In [1], Audet shows that this can actually occur by constructing a pat- 
tern search algorithm and objective for which { x k } has infinitely many limit points, one of which is not a 
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stationary point of the objective. However, Theorem 3.1 reassures us that in practice we need not worry 
about convergence to non-stationary points. If we stop the algorithm at the first unsuccessful iterate for 
which A/. < A* for some suitably small stopping tolerance A*, then Theorem 3.1 says that || V/(x;t) || will 
also be small. 

4. Local convergence. We next apply Theorem 3.1 to prove results on the local convergence of pattern 
search methods. We place the following mild hypothesis on the patterns Pk in order to bound the size of 
the steps Sk- 

HYPOTHESIS 0. The columns of the pattern matrix Pk = [c\- ■ • c^ fc ] remain hounded in norm, i.e., there 
exists C > 0 such that for all k, C > || c l k ||, for all i = 1, ■ • • ,pk- Thus, there exists a constant Co > 0 such 
that any step Sk satisfies 


&k || Q)A k- 


We also impose the following condition on the pattern size parameter A . 

Hypothesis 1. There exists N for which A*, is monotonically nonincreasing for all k > N. 

Note that this is a condition we can explicitly enforce by not allowing increases in A^ after some iteration 
N; Afc can stay the same or decrease. 

Our analysis is concerned with how pattern search behaves in the neighborhood of an isolated local 
minimizer x*. We make the following assumptions about the behavior of / in the neighborhood of x*. 

Hypothesis 2. We assume that f is twice continuously differentiable on an open ball B{x*,ri) of x*, 
V/(x *) = 0, and that the second order sufficiency condition V 2 /(x*) > 0 holds at x*. 

We further assume that V 2 /(x) is positive definite for all x £ B{x*,rj). Let £T min and < 7 max be lower and 
upper bounds, respectively, on the singular values ofV 2 f(x) on B(x t ,rj), and define k = cr max /a m - m . 

For convenience, we will assume that rj < where $31 is as in Theorem 3.1. Clearly we may do 
this without any loss of generality. This assumption ensures that B{x*,rj) C H and that we may apply 
Theorem 3.1. 

Our first result relates A*, to || Xk — x * || at unsuccessful iterates. 

Proposition 4.1. Under Hypotheses 0-2, there exist rj > 0, and C 4.1 >0 for which the following holds. 
If Xk is an iterate at which there is an unsuccessful iteration, A*, < g, and || Xk — x* || < r/, then 

|| x k - a-* || < c^Afe. 


Proof. By the mean value theorem, 

Vf(x k ) - V/(x*) = V 2 f(0(x k - x*) 

for some f on the line segment connecting Xk and x*. Since V/(x*) = 0, we have 

||V/(x fe )|| = ||V 2 /(£)(xfc — x*)|| > o- min ||x fc - x*||. 

Furthermore, Theorem 3.1 holds since A; {: < r] < $ 3 . 1 , whence 

^min || Xk — x* |J < II V/(xjfc) || < c 3 il A fe . 

Setting C4.1 = c 3 .i/<T m i n completes the proof. □ 
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We wish to prove that the entire sequence of iterates converges to x*. To this end, we begin with an 
elementary result concerning the level sets of / near x*. 

Proposition 4 . 2 . Under Hypothesis 2, if x,y £ B(x#,r/) and f(x) < f(y), then 

( 4 . 1 ) II X — X* || < K2.|[ y — x * ||, 

where k is as defined in Hypothesis 2. 

Proof. Suppose x, y £ B(x*,r]) and /(x) < f(y). From Taylor’s theorem with remainder and the fact 
that V/(x*) = 0, we have 

f(y) = \{v - x *) T v 2 f(0(y - x *) 

f(x) = i(x - x*) T V 2 /(tu)(x - X*) 

for £ and to on the line segments connecting x* with y and x, respectively. Since /(x) < f(y), we obtain 
0 < f(y) - f(x) = ^(y~ x*) T V 2 f(f)(y - x*) - i(x - x*) T V 2 /( to)(x - x»), 

whence 


0 <Xmax|| y x * ||^ ffinin|| X X* I) -1 , 

and thus (4.1). □ 

We use the previous proposition to show that if we start sufficiently close to x* with a sufficiently small 
pattern size parameter A*,, and we stop allowing increases in A*, after some point (Hypothesis 1), then 
pattern search will not move away from a neighborhood of x* . 

Proposition 4.3. Under Hypotheses 0-2. there exist <$43 > 0, £4,3 > 0, and C43 > 0 for which the 
following holds. For k > N, where N is as defined in Hypothesis 1, if Xk is an iterate for which A& < 64,3 
and || Xk — x* || < £4,3, then for all £ > k, 

|| xn — x* || < 77 


where r] is as defined in Hypothesis 2. 
Proof. Choose A 4 3 and £ 4.3 to satisfy 


*4.3 < XT 

ZCq 


£4.3 < min < - 


• fy 1 -i 

in s — , —k 2 

\ 2 2 


The proof is by induction. First consider x^+i = Xk + s*,. By Hypothesis 0, there is a constant Co such 
that || x/c+i — Xk || = || Sk || < coAfc. We have, a priori, 


Xk+i - x* || < || x fe+ i - x k || + || x k - x* || < Cq A fc + £4.3 < T], 


so x/j-j-i £ B{x*,rj). Since f(xk-\- 1 ) < /(x^), and Xk+i £ B(x*,rj), from Proposition 4.2 we obtain 

||x fe+:l -x*|| < K? || Xk x* || < K2£ 4 .3 < 77. 


Now consider any £ > k + 1, and suppose 


x e - x* 


< v- 
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Then 


(4.2) || x.£ +1 - x* || < || x^ +1 - x e || + || xt - x* 

Hypothesis 1, assures us that Ag < A/~ for £> k, so 


|| x^ +1 - x e || < c 0 A£ < coA fe . 

Meanwhile, by the induction hypothesis, xg £ B(x*,r]). Since f(xf) < f(x k ), as well, Proposition 4.2 and 
the assumption || Xk — x* || < £4.3 say that 

1 n .. 1 

II — X* II < K 2 || Xk - X* || < K 2 £ 4 . 3 . 


Thus (4.2) yields 


x*+i - x* || < cqAa, + k 2 £ 4 . 3 < rj. 


a 

For the final local convergence result, we need the following additional hypothesis, which underlies the 
global convergence analysis of pattern search [11]. 

Hypothesis 3. The set L(x 0) is compact. 

From [11] we know that if L(x 0) is compact, lim inf^co A k = 0. Thus, Hypothesis 1 and Hypothesis 3 
together mean that lim^oo A k = 0. 

Putting the pieces together, we obtain the following local convergence result for pattern search. 
Theorem 4.4. Suppose Hypothesis 3 holds. Given a pattern search algorithm satisfying Hypotheses 
0-1, suppose there exists a limit point x* of the sequence of iterates {x^} produced by the algorithm that is 
a local minimizer satisfying Hypothesis 2. 

Then there exist c 4 4 > 0 and K such that for all k > K, 

(4.3) || Xk x* |j ^ C4 4 A m (/-), 

where m(k) is the last unsuccessful iterate preceding k. As a consequence, we have lim^oo x k = x*. 

Proof. Recall N from Hypothesis 1, the iteration after which we no longer allow A*, to increase. Since 
x* is a limit point of {x^} and lim^-joc A^ = 0 (by Hypotheses 1 and 3), there exists an iterate x^, k\ > N 
such that 


A kl < min{r/, £4.3} 

|| x fel -x* || < min{r/, £ 4 . 3 }, 

where rj is as in Hypothesis 2 and £4.3, £4.3 are as in Proposition 4.3. 
By Proposition 4.3, we have 


|| Xk - x* || < 77 

for all k>k\. Then, by Proposition 4.1, we have 

|| x m x* || fC C4.1 A m 


for all unsuccessful iterates x m with m > k\. 
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Now let K be the first iteration after at which we have an unsuccessful iteration. For all unsuccessful 
iterations k > K, Proposition 4.1 gives us 

(4-4) || x k - x* || < c 4 . lAjt = c 4 .iA m ( fe ), 

since k = m(k) in this case. Meanwhile, for the successful iterations k > K , we have f(x k ) < and 

Proposition 4.3 assures us that || x k — x* || < rj. It then follows from Proposition 4.2 that 

(4.5) || Xk — X* || < K2 || X m (jk) - Z* || < K2C 4 .!A TO ( fc ). 

Then (4.3) follows from (4.4) and (4.5), with c 4 4 = k^c 4 . 4 . 

Since Hypotheses 1 and 3 imply A*, — > 0, it follows that lim^oo x k = x*. □ 

Theorem 4.4 complements Theorem 3.7 of [11], where it is shown, under different hypotheses and a more 
stringent criterion for accepting a step, that || V f{x k ) || — > 0. 

Theorem 4.4 says that for the subsequence of unsuccessful iterates, the rate of convergence is r-linear. 
Theorem 4.4 says nothing about what may happen at the successful iterations, nor how many such itera- 
tions there may be between unsuccessful iterations. What we have is a sort of multi-step r-linear rate of 
convergence, but one for which we do not know, and, as our numerical tests indicate, cannot predict, what 
the number of intervening steps may be. For want of an existing term for this notion of convergence, we call 
it desultory r-linear convergence. 

The obstruction to sharpening the rate of convergence result is that we do not know a priori how much 
improvement we obtain in f(x) at the successful iterations. At unsuccessful iterates, we reach the decision 
that we can only make progress by reducing A* only after we evaluate f(x) along a set of directions that 
necessarily includes a descent direction (see Fig. 2.3 and Proposition 2.1). On the other hand, successful 
iterates are less informative. We have no a priori idea of how much improvement we obtain in a successful 
iteration relative to the minimization of the quadratic model of f(x) along the pattern search directions. 
Given the paucity of information generally available in pattern search one must not expect too much. 

More positively, Theorem 4.4 suggests how one can “accelerate” the local convergence of pattern search 
algorithms. One need only rename the formerly unsuccessful iterates successful iterates and drop the formerly 
successful iterates from discussion. Then, mirabile dictu , this simple modification makes the successful 
iterations an r-linearly convergent sequence. 

5. Numerical results. We now present some numerical experiments that illustrate the practical im- 
plications of the local convergence results. The first round of testing, reported in Section 5.2, supports the 
analysis; the second round, reported in Section 5.2, shows its limitations. 

The numerical results we report are in no way exhaustive, and simply serve to illustrate how the local 
convergence results are manifest in practice. The results we report here on the effectiveness of A^ as a 
measure of stationarity are, in part, a summary of some of the results reported in [5] . The second round of 
testing regarding the local rate of convergence was based on the implementation of pattern search developed 
for, and reported in, [5]. 

5.1. The testing environment. Full details of the numerical experiments can be found in [5]. The 
tests we report here were done with randomly generated positive definite quadratic functions. This is a 
reasonable choice, since we are interested in the local convergence behavior of pattern search, and any C 2 
function looks like a convex quadratic in the neighborhood of an isolated local minimizer. The quadratics 
tested were of the form f(x) = x T Ax + c, where A = H T H and H £ j^(,™+ 2 )xn a ma trix with entries 
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that are normal random variates with means of zero and standard deviations of one. The lack of a linear 
term causes the solution to lie at the origin. The constant term c is not interesting for the purposes of the 
optimization, but provides a useful tag for identifying individual functions. For the testing in [5], 2 < n < 5; 
we show two results for n = 5. 

In addition to randomly generating the entries of the matrix H, we also randomly generated Ao and the 
entries of the vector xq. The entries for the starting point xo were also normal random variates with means 
of zero and standard deviations of one. The choice for Ao was an exponential variate with a mean of one. 

The software described in [5] was written in C++ to make use of C++ classes, a convenient way to 
establish the key features of pattern search and then easily derive specific variants. Several of these variants 
were implemented and tested, as described in [5]. We show results using HJSearch, an implementation of 
the classical pattern search algorithm of Hooke and .Jeeves [6]; CompassSearch, the pattern search algorithm 
described in Section 2; and NLessSearch, a pattern search algorithm that takes advantage of the fact that a 
minimal positive basis requires only n + 1 vectors [7], as opposed to the 2 n coordinate vectors used in most 
traditional pattern search methods, including compass search and Hooke and Jeeves. 

5.2. Measuring stationarity. The first question we ask is how effective is A* as a measure of station- 
arity? Not too surprisingly, the results of our tests showed that A*, is a reliable measure of progress toward 
a solution. Furthermore, our numbers make quite clear the r-linear behavior of the sequence of unsuccessful 
iterates. 

After any unsuccessful iteration, a pattern search method is required to reduce A*. We used the standard 
reduction factor of ^ so that after an unsuccessful iteration, Afc+i = ^A^. Before proceeding to the next 
iteration, we recorded the value of A*,, || V/(a+)|| , |/(a+) — /(x*)|, and ||xfc — £*|| (though since we knew 
x* = 0, we simply had to compute ||x*||). Representative results from two particular tests are given in 
Tables 5.1 and 5.2. 


Table 5.1 

NLessSearch in five variables 


A k 

l|V/(* fc )|| 

l/fyfc) - /(z*)| 

\\xk - X*\\ 

0.696226813823902 

3.868198250940641 

0.97865190917639 

0.277727609425140 

0.348113406911951 

1.774568869170108 

0.29434050891286 

0.217654912583381 

0.174056703455976 

1.453229516907024 

0.14479398150378 

0.108469794156683 

0.087028351727988 

0.281730564593278 

0.02631186626609 

0.115903974197448 

0.043514175863994 

0.294368401951189 

0.01457817127405 

0.070417918465458 

0.021757087931997 

0.139178781830330 

0.00462725481091 

0.040571903721886 

0.010878543965999 

0.029890455941604 

0.00073483024738 

0.034760919692137 

0.005439271982999 

0.029890455941604 

0.00073483024738 

0.034760919692137 

0.002719635991500 

0.026442509744175 

0.00033346616931 

0.024811502814971 

0.001359817995750 

0.012307380822946 

0.00022911316873 

0.021849036278915 

0.000679908997875 

0.009511207300009 

0.00014689251267 

0.017596624227294 

0.000339954498938 

0.005426023491341 

0.00004796415094 

0.010013823809977 

0.000169977249469 

0.002065061853736 

0.00000720339894 

0.003890485486089 

0.000084988624734 

0.001038174548727 

0.00000179297966 

0.001938237809581 

0.000042494312367 

0.000435621809509 

0.00000034070778 

0.000849356148369 

0.000021247156184 

0.000231605471572 

0.00000008825774 

0.000432171965702 

0.000010623578092 

0.000113220891634 

0.00000001411259 

0.000168890905210 

0.000005311789046 

0.000067613953149 

0.00000000577676 

0.000108943399928 
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Table 5.2 

HJSearch in five variables 


Ak 

II V/(xfc)|| 

I f(xk) ~ f{x ») | 

|| Xk — £*|| 

0.696226813823902 

3.718628968450993 

3.96639084353257 

2.396301558944381 

0.348113406911951 

1.370661155865317 

0.44618879006458 

0.698389592313846 

0.174056703455976 

0.993386770046628 

0.19091214014793 

0.450386903073632 

0.087028351727988 

0.236893510661273 

0.01477525409286 

0.153943970082610 

0.043514175863994 

0.314026005456998 

0.01421309666224 

0.119315177505950 

0.021757087931997 

0.131650296045321 

0.00223337373949 

0.034002609804365 

0.010878543965999 

0.042526372212693 

0.00028996796577 

0.015791910616849 

0.005439271982999 

0.032921235371376 

0.00018078437086 

0.014678346820778 

0.002719635991500 

0.012854930063180 

0.00014567060113 

0.016582990396810 

0.001359817995750 

0.005667414556147 

0.00001023084696 

0.003757596625046 

0.000679908997875 

0.004101406209192 

0.00000429756349 

0.002612852810391 

0.000339954498938 

0.001396318029208 

0.00000050775161 

0.000854609846084 

0.000169977249469 

0.000833651146770 

0.00000049903818 

0.000985750547712 

0.000084988624734 

0.000563050121378 

0.00000002356890 

0.000074244150774 

0.000042494312367 

0.000112117511534 

0.00000000325088 

0.000043510325021 

0.000021247156184 

0.000097664692564 

0.00000000266601 

0.000032689236837 

0.000010623578092 

0.000035578092711 

0.00000000026108 

0.000013878637584 

0.000005311789046 

0.000010624256256 

0.00000000015362 

0.000017458315183 


The purpose of the results we report in Tables 5.1 and 5.2 is not to demand close scrutiny of each entry, 
but rather to demonstrate the trends in each of the four quantities measured. We clearly see the r-linear 
behavior the analysis tells us to expect: by the time we halve A*, we have roughly halved the error in the 
solution. (Note that in Table 5.1 we have two consecutive unsuccessful iterations.) 

We report here the results from only two experiments, but these are representative of results from ten 
thousand runs over multiple quadratics, in multiple dimensions, from multiple starting points, with multiple 
choices of Ao, using four different pattern search methods. We found that across all these tests A*, gave us 
a consistent measure of the accuracy of the solution. 

One beauty of using Ak as a measure of stationarity is that it is perfectly natural in the context of 
pattern search; no additional computation is required. Another strong reason for using A& as a measure 
of stationarity is that it is remarkably insusceptible to error. Once an initial choice of A& has been made, 
the only possible numerical error occurs when converting the decimal representation to binary. Since most 
implementations use the value of ~ to reduce A*,, which requires only a simple binary shift, there are no 
additional numerical issues to cloud the computation of this measure. We close with one final observation to 
be made about the practical utility of A* as a measure of stationarity. A useful feature of pattern search is 
that one only requires ranking, or order, information to drive the search — no numeric values for the function 
are necessary [7]. In such a setting, A^ is a feasible measure of progress whereas measures based on the 
numeric values of the objective function are not. 

6. How many successful iterates? Theorem 4.4 says that the subsequence of unsuccessful iterates 
converges r-linearly once we are in the neighborhood of a solution. A natural question to then ask is: how 
many iterations occur in practice between each iterate included in this subsequence? This question is more 
subtle, and our results are less conclusive. Again, we illustrate our findings with only a few specific examples 
in Figs. 6. 2-6. 5. 
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In all instances, we give a final stopping tolerance of 2 X 10~ 8 ; i.e., we terminate the search when 
Ak < 2 X 10 8 . Along the horizontal axis, we list the number of unsuccessful iterations; i.e., the number 
of times we halve before it is less than the stopping tolerance. Each bar then represents the number of 
successful iterations that preceded an unsuccessful iteration plus the (single) unsuccessful iteration so that 
summing all the entries gives us the total number of iterations for the search. 

Notice that for the three algorithms we tested, the scale on the vertical axes varies considerably. For 
NLessSearch, the number of successful iterations preceding an unsuccessful iteration can be considerably 
higher than, say, for HJSearch, but overall, the results are inconclusive. We cannot predict how many 
successful iterations may proceed an unsuccessful iteration, nor does there seem to be any particular trend. 

The only substantive observation we can extract from our tests is illustrated by the example shown in 
Fig. 6.1. 


Number of Local Search Iterations for each Decrease in Delta Number of Local Search Iterations for each Decrease in Delta Number of Local Search Iterations for each Decrease in Delta 



FlG. 6.1. NLess, Compass , & HJ in 4 variables 


Here the explanation for the huge number of successful iterations before is ever reduced seems to be 
straightforward. The initial choice of Ao, which is drawn randomly, is so small (0.001128116614106) that 
initially there is a long sequence of successful iterations, but progress is remarkably slow because we start 
with such a small choice of Ao that all the trial steps are quite short. The obvious conjecture to make is 
that it is better to start with a reasonably large choice of Ao — a practical issue that is the subject of another 
study in preparation. 


Number of Local Search Iterations for each Decrease in Delta 




Number of Local Search Iterations for each Decrease in Delta 


$ 20 
g 15 




Number of Local Search Iterations for each Decrease in Delta 




delta decrease 


delta decrease 


FlG. 6.2. NLess, Compass, & HJ in 8 variables 



Number of Local Search Iterations for each Decrease in Delta 



Number of Local Search Iterations f 


se in Delta 
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k 
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delta decrease 


FlG. 6.3. NLess, Compass, & HJ in 8 variables 
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Number i 


Search Iterations lor each Decrease in Delta 


Number of Local Search Iterations for each Decrease in Delta 


Number of Local Search Iterations for each Decrease in Delta 



fnmHnlllhrllr 


delta decrease 


FlG. 6.4. NLess, Compass, & HJ in 4 variables 


Number of Local Search Iterations for each Decrease in Delta Number of Local Search Iterations for each Decrease in Delta Number of Local Search Iterations for each Decrease in Delta 







delta decrease 


10 15 20 25 30 

delta decrease 


Fig. 6.5. NLess, Compass, & HJ in 4 variables 


7. Conclusion. The results given here round out the convergence analysis of pattern search. The 
analysis and numerical experiments reported here show that A* can be used as a reliable stopping criterion. 
Moreover, these tests show that the correlations predicted by Theorems 3.1 and 4.4 between A*, || V/(xfe) ||, 
and || Xu — x* || are manifest in practice. These results vindicate the intuition of the early developers of 
direct search methods. 

Acknowledgments. We are indebted to Natalia Alexandrov for a conversation that led to the term 
“desultory convergence” in connection with Theorem 4.4. We also thank Stephen Nash for a lively discussion 
about stopping criteria. 
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